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Using cryptography to protect information and communication has 
bacically two major drawbacks. First, the specific entropy profile of en- 
I I crypted data makes their detection very easy. Second, the use of cryptog- 

raphy can be more or less regulated, not to say forbidden, according to 
the countries. If the right to freely protect our personal and private data 
^ is a fundamental right, it must not hinder the action of Nation States 

with respect to National security. Allowing encryption to citizens holds 
for bad guys as well. 

In this paper we propose a new approach in information and com- 
munication security that may solve all these issues, thus representing a 
rather interesting trade-off between apparently opposite security needs. 
We introduce the concept of scalable security based on computationnally 
] hard problem of coding theory with the Perseus technology. 

. . The core idea is to encode date with variable punctured convolutional 

^ codes in such a way that any cryptanalytic attempt will require a time- 

consuming encoder reconstruction in order to decode. By adding noise in 
rS a suitable way, that reconstruction becomes untractable in practice except 

^ for Intelligence services that however must use supercomputers during a 

significant, scalable amount of time. Hence it limits naturally any will to 
unduly performs such attacks (eg. against citizens' privacy). 

On the users' side, encoder and noise parameters are first exchanged 
through an initial, short https session. The principles behind that ap- 
proach have been mathematically validated in 1997 and 2007. We present 
the Perseus Ubrary we have developed under the triple GPL/LGPL/MPL 
licences. This library can be used to protect any kind of data. 

Keywords: Communication security - Coding theory - Code reconstruction 
- Traffic eavesdropping - Encryption. 



•This work has been presented at jAWACS 2010. 
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1 Introduction 



A necessary - but not sufficient - condition for cryptographic security iies in tlie 
secret key size. Cryptograpfiy is itself defined as tlie use of a secret quantity - 
the key - while coding uses open, widely known mathematical objects without 
any secret quantity. 

The main issue is then: can cryptography be characterized by the presence 
of a secret quantity only? While it is a necessary condition, it is not a sufhcient 
one. The deep and careful analysis of cryptographic laws of most countries (and 
international organizations) shows that the "legal" definition of what crypto 
really is and what is not, relates directly to following (noise) probability 

P[ct =mt® et] = P[et = 1] 

where Ct and mt are the ciphertext and plaintext bits respectively and where 
et can be defined as the noise bit produced by the key the cryptosysten:|^ (at 
time instant t). Then, if P[et = 1] = | ± e with e very close to zero, then it 
is cryptography, otherwise (e significantly different from 0) it is coding theory. 
But are differences between cryptography and coding theory so easy to define? 
Known cryptanalysis techniques intend to deal with the first case more or less 
efficiently. On the other side, there are a lot of decoding problems that are 
computationally hard. 

In this paper we are going to consider such a computationally hard problem 
in order to provide a new information and communication protection scheme 
whose security level is scalable. We have called it Perseu^ technology and we 
present here the open source library we have developped to protect any kind of 
data and protocols. 

Perseus technology's core idea is to encode data with punctured convolu- 
tional codes. Those codes are commonly used in telecommunications (GSM, 
satellite...) due to their very high encoding speed and their high correcting 
power. After this encoding layer and right before transmission, an artificial 
noise is applied to the data flow (as would any channel do). The noise is gen- 
erated according to noise parameter p = P[et = 1] where is the noise bit at 
time instant t. The value of p is around 0.3. Since the convolutational encoder 
is changing very frequently the attacker always has first to reconstruct the en- 
coder in order to be able to decode. This reconstruction has been proven to be 
a computationally hard problem [3l El [71 H] • By scalable we mean that if it is 
always possible to break PERSEUS-protected data, the difficulty can be tuned up 
in order to require more or less computational efforts: from a few days to a few 
months on a supercomputer. In addition, only an equivalent, non-punctured 
encoder can be recovered [2- However this problem may still remain tractable 
to solve for any intelligence agency with a suitable computing power. 

^This holds also for block ciphers where the "effect" of the key on the plaintext block can 
formalized in this way. 

■^Perseus is the mythic hero of Greek mythology who killed the Gorgon Medusa l20i . The 
botnets - against which Perseus technology has been designed initially \5\ - are themselves 
often compared to Medusa and its long tentacles. 
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The different parameters of the variable encoders are randomly generated: 
polynomial size constraint, encoding rate, matrix puncturing, noise parameter p, 
encoder polynomials... Then a short https initial session allows to communicate 
those parameters to the recipient (about 256 bytes) . The recipient and only him 
is able first to get rid of the artificial deterministic noise and then to set up the 
suitable Virberi algorithm for data decoding. 

What the interest of using scalable security while generally only strong, 
unbreakable cryptography offers real security? Why would users prefer Perseus 
technology instead of strong cryptography? On the other side, why existing 
national or international regulations would tolerate the use of this technology? 
There are on the contrary many reasons to favour the Perseus approach over 
strong encryption. 

• The use of encryption, besides the fact that it would lead to severe con- 
straints (encryption overhead, key management...) poses problems in 
terms of legal regulations, especially in the context of transnational streams 
with respect to the different national regulations. Then a critical issue 
arises: how can we protect our personal and private data while still allow- 
ing the necessary action of States (for national security for instance) in the 
field of communication surveillance and whithout lessening the transmis- 
sion rate significantly? Scalable security offered by Perseus provides such 
a trade-off very efficiently. Any PERSEUS-protected data can be broken 
provided that a significant amount of time of supercomputer is spent. This 
limits any States' intents to spy innocent people not involved in terrorism, 
mafia activities, child pornography... and making them focusing on really 
bad guys. Moreover the generalization of encryption is not a good thing 
as pointed out by the US National Security Agency [El [M] and British 
MI-5 [TT] about HADOPI's French questionable initiative. Favouring the 
use of encryption to protect illegal downloading can severely hinder the 
cryptanalysis activities of States for national security purposes. 

• Why use noisy encoded data instead of encrypted data? Encrypted data 
by nature exhibit a maximal entropy profile. It is then easy to detect 
encrypted data. On the contrary, noisy encoded data can exhibit a lower 
entropy profile which remains closer to that of plain, uncncoded data. 
This lower statistical profile enables to bypass any detection by entropy 
test or any other statistical detection while encrypted data do not. 

To summarize these two strong points of Perseus technology, let us consider 
an illustrative example. John Doe is a US journalist in China. He wants to 
send a serie of papers about China's Human Rights infringements (and about 
the 2010 Chinese Peace Nobel Prize). Sending his papers to his agency in USA 
would be blocked by Chinese authorithies whenever encrypted. On the contrary, 
using Perseus will require a significant time to detect (due to the low entropy 
profile) and to break. The journalist will have time to go back and safe to USA. 



This paper is organized as follows. Section [2 
lutional codes and their reconstruction. Section 



recalls basic facts about convo- 
3] presents the Perseus library 
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structure while Section |4] deals with its detailed implementation. Section [5] 
presents the different experimental results we have obtained with respect to 
final data entropy and performance while Section [7] concludes by considering 
future evolution of this library. 

2 Theoretical Background 

In this section, we are going to recall what a (punctured or not) convolutional 
code is as well as the main results with respect to their reconstruction. The 
aim is just to provide the reader with the required background to understand 
the interest of those codes and why they are particularly suitable for our ap- 
proach. The interested reader will refer to [T3] for a more detailed presentation 
on convolutional codes. 

2.1 Convolutional Codes 

A convolutionnal encoder can be seen as an encoding system (based on a set of k 
shift-registers without feedback) such that, at each time instant, k information 
digits (typically the bits of data) enter the encoder (one per register). Each 
information digit remains in the encoder for K time units and may affect each 
output during that time. The constant K is the constraint length or the memory 
of the encoder. 

At each time instant, n information digits are output, each of them result- 
ing from the XOR of k digits produced by the action of n polynomials on each 
register. The encoder is thus said to be of rate K The action of the kn poly- 
nomials and the shift are easily described by polynomial multiplications 8 . So 
the polynomial representation will be used to represent the different streams. 

A message will be composed of k interlaced input streams, each of them 
represented as a polynomial of degree N -\-t denoted ai{x), i = 1, . . . ,k. The kn 
polynomials are of degree N (hence N = K —1) and will be noted fij{x). Then 
the encoder produces n output streams (of length t) represented as polynomials 
of degree t, Cj(x), j = 1, . . . , n and we then have: 

k 

'^ai{x)fij{x) = Uj^i{x) + x^Cj{x) + x'^'^''uj,2{x) (1) 

i=l 

The polynomials Uj i{x) (resp. 1*^ 2) (the filling (resp. the emptying) of the 
registers) are of degree at most — 1. Then the coded sequence is composed 
of the n interlaced output streams. 

Thus the parameters of a convolutionnal encoder are: 

• k and n defining the rate and the number of polynomials, 

• K the constraint length (in fact it is related to internal memory of the 
encoder). 
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• the kn polynomials fi,j{x) of degree N = K — 1. 

The convolutionnal encoder then describes a (n, k, A^)-code. Generally, n and 
k are small integers with k < n. The most frequent case is fc = n — 1. On the 
contrary, N must be made large enough to achieve low residual decoding error 
probabilities. The symbols are usually elements of GF{2) but generalization to 
GF{q) where q is some prime power [q — p™ for some positive integer m) can 
be easily done. We will only consider the case g = 2 but all the implementation 
and results can be generalized to any other prime q. This could be interesting 
in increasing the encoding speed. 

Figure 111 describes a convolutional encoder of rate \ . 



1 1 

VQ, VI,... 




12 12 

VQ. VQ, VI, VI,... 



Figure 1: Convolutional encoder of rate ^ 



In the context of Perseus, we will add an artificial noise of parameter p to 
the (encoded) output sequence v = v'"(l\ v''i\ . . . 

The decoding step is performed through the classical Viterbi algorithm whose 
complexity is exponential in k.N . Hence, generally their use is limited to codes 
of short lengths and to reduced encoding rate ^ . However in our case since we 
completely master the noise (we exactly know where the noise bits are applied 
while any botnet agent does not), we can work with far higher values. 



2.2 Punctured Convolutional Codes 



Punctured convolutional codes were introduced by Cain et al. 4J as means of 
greatly simplifying both Viterbi and sequential decoding of high rate convolu- 
tional codes at the expanse of a relatively small performance penalty. 

A punctured convolutional code C is obtained by periodically deleting output 
symbols from a (base) {n, k, Af)-convolutional code Ct- Output symbols from Cf, 
are deleted according to a periodic puncturing pattern (or perforation pattern) 
which can be described by its puncturing matrix: 



P = 



Pn,l 



Pn,M 



A very important problem is that of the reconstruction of such codes (punc- 
tured or not). In an attack context, a monitor wants to have access to the 



5 



transmitted information {the message) without any knowledge on the encoder 
which produces the intercepted stream {the coded sequence). The only way is to 
reconstruct the encoder, that is to say to recover all its parameters. A simple 
decoding then gives access to the message provided that the channel noise is 
not too high (less than a very few percents). 

Let us consider a (n, k, 7V)-(base) convolutional code Ct- A given puncturing 
pattern PisanxMO— 1 matrix with a total of / I's and nM — I O's where 
Pij = indicates that the i-th symbol of every branch in the j-th treillis section 
(of the treillis diagram of Cb) is to be deleted. 

Then the original code Ci,, after being punctured with pattern P, has become 
a (/, /cM, m)-(punctured) code|^C [13]. 

Let us consider an illustrative, simple example. 

Example 1 Let us take the (2, 1,3) code with polynomials 

{1 + + X + x^) 
The two output streams can be denoted as follows: 

/ Xo Xi X2 X3 Xi xs . . . \ 

V Vo Vl V2 2/3 2/4 2/5 •■■ / 

By using the following puncturing pattern: 

we then obtain the two following output streams: 

f Xo X2 Xi ... \ 

V yo 2/1 2/2 2/3 2/4 2/5 ■ • ■ / 
that we can rearrange as follows: 

Xo X2 X4 ■ ■ ■ \ 

2/0 2/2 2/4 ••• 
2/1 2/3 2/5 ••• / 

It becomes then obvious that this puncturing produces a new encoder producing 
three output streams. 

By use of polycyclic pseudo-circulant matrices /?/, the new parameters are 
easily defined and we have the 6 following polynomials 

/i,i(2^) = l + ^ /i,2(a;) = 1 + x /i,3(a;) = 1 

f2s{x) = f2a{x) = X f2.3{x) ==1 + 2: 

where fij denotes the j-th parity-check polynomial applied on input message 
stream i. 

As for Perseus is concerned, the puncturing pattern P is the last parameter 
to exchange during the initial https session. 

■^In fact, the degree of the punctured code may be less than N, but for most interesting 
punctured codes no degree reduction will take place 
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2.3 Reconstruction of Convolutional Codes 

Since any punctured convolutional code is equivalent to a non punctured convo- 
lutional encoder, we will thus focus on the reconstruction of the latter codes. As 
far as code reconstruction is concerned, it is worth mentioning that the use of 
punctured codes make it more complex since we have equivalent non punctured 
codes whose parameters have higher values, for suitable values of I, k and M. 

It is always possible to reconstruct convolutional codes in offline mode. This 
is basically not a problem since for most real cases, convolutional encoders do not 
change very often since they are hardwired (as an example, two convolutional 
encoders of constraint length of 9 are embedded in the UMTS standard [T]). 
Consequently we can spend a lot of time to reconstruct them since the work 
is done just once. However, there are only a very few known cases (most of 
them are for tactical, military communications like in the Czech army at least 
during the 90s) where the encoders are randomly generated right before the 
transmission. The aim is clearly to hinder the code reconstruction strongly, 
which therefore cannot be performed online. In this latter case, except for very 
small values of parameters and noise probability, the reconstruction is too much 
time consuming. 

The reconstruction of convolutional codes is a very mathematical stuff and 
consequently we will not present it here (see [SI [3] for an exhaustive study). 
For our purposes, it is just necessary to recall the most significant results with 
respect to convolutional codes reconstruction. 

While it is always possible to make the probability of false alarm (i.e. to 
reconstruct a wrong encoder) tends towards zero, the probability of success de- 
pends on many factors but the noise parameter has the most significant impact. 
Beyond 2-3 % the reconstruction will fail unless having a large amount of en- 
coded sequence or/and accepting to spend a lot of time/machine ressources. In 
most practical cases, the Viterbi decoding itself is likely to fail for a few percent 
of noise (less than 0.05) long before the reconstruction process does. Expressing 
the reconstruction probability of success is not easy from a mathematical point 
of view and we advise the reader to refer to [BJ [3j . Experiments have confirmed 
that the reconstruction is bound to fail as soon as p > 3% unless spending a lot 
of time and computing power. 

As for the computational complexity of the reconstruction, the general result 
[3 12] states that for a {n, k, 7V)-convolutional code, the lower bound is equal to 
0{a X X A^^) where a{p) is a quantity which grows exponentially with the 
noise probability p [31 Section 2.3.2]. 

To illustrate that general result. Table [T] gives a few experimental results 
[6l|3] for a few encoders in the case of a noise level of 10~^ and 2.10"^ (Additive 
White Gaussian noise). 
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Encoder 


Reconstruction time 

(P = 10-2) 


Reconstruction time 
(p = 2.10-2) 


(4, 3, 8) 


7 min 12 sec 


Non detected 


(4, 3, 9) 


6 min 16 sec 


Non detected 



Table 1: Example of reconstruction time (on Pentium IV 2.0 Ghz) for two noise 
levels 



As a consequence, considering a rather high level of noise prevents the re- 
construction to succeed unless we devote a huge computing time (several hours) 
at least. We then will choose a noise level ranging from 0.15 to 0.35. 

Let us mention that Perseus technology considers (and implements) the 
worst case of communication channel model with respect to the reconstruction 
problem: the Additive White Gaussian model in which the noise is applied uni- 
formly (in other words the noise variable is a random, identically distributed, 
independent variable). In real communications (for instance satellite communi- 
cations) the noise occurs by burst and different channel models must be consid- 
ered (e.g. Gilbert-Elliot model [12 ). 



3 Presentation of the PERSEUS Library 

The library includes two main files: 

• A header file perseus .h which contains the parameters settings, new type 
definitions and function prototypes. 

• A function file perseus . c which contains the C code of the different func- 
tions: random encoder generation, encoding procedure, decoding proce- 
dure. . . 

Additionally, different files are also provided with the library: 

• A test program perseus_test . c which presents how to implement and 
use the Perseus library. 

• A makefile to compile the previous test file. 

• A documentation file howto_libperseus.pdf and a comprehensive de- 
scription of library structure and functionalities produced from the source 
code by means of the doxygen utility. 



The official code repository is located on code . google . com/p/libperseus The 
current stable version is 1.0.0. 
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3.1 Setting Perseus Parameters 



Perseus parameters are optimally defined in the perseus.h file to provide the 
best trade-off between security and performance. The reader who would desire 
to modify those parameters must keep in mind that some of them have an impact 
on the decoding residual error. So any modification should be envisaged only 
for programmers having a rather good knowledge in convolutional encoding and 
Viterbi decoding theory [13]. 

The main parameters are generated randomly during the encoder generation. 
So only lower Xmin and upper bounds Xmax are set in order to define a value 
interval [Xmin] Xmax + Xmin]- 

3.1.1 Encoder inputs 

The number of encoder inputs is given by (default values [1; 6]). 



# define KMIN_GEN 1 

# define KMAX_GEN 5 



3.1.2 Encoder ouputs 

The number of encoder outputs is defined by (default values [5; 11]). 



# define NMIN_GEN 5 

# define NMAX.GEN 6 



3.1.3 Constraint length (encoder memory) 

The size of the encoder memory (which also determines the degree of encoder 
polynomials) are defined by (default [20; 30]). 



# define MIN.CONT 10 

# define MAX_CONT 20 



3.1.4 Puncturing matrix width 

The width of the puncturing matrix whose height is defined by the value N S 
[NMINgen-.NMINgen + NMAXgen] (defauh [6; 21]). 



# define MIN_MATWIDTH 6 

# define MAX.MATWIDTH 16 

The puncturing level is defined by the number of null entries of that matrix. 
This number is defined as follows 
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* Random generation of the puncturing matrix weight * / 
/* (code— >mN*code— >mMatWidth — nbzero) */ 
/* where nbzero = (code— >mN*code—>mMat Width/8) */ 
nbzero = (int) (( float ) ((code->mN*code->mMat Width) >> 3)); 
code— >mMatDepth — (code— >mN*code—>mMat Width) — nbzero; 

Let us notice that it is possible to adapt the weight of the punctured matrix 
according to the values of N and mMatWidth. For rather large values of their 
product it is possible to divide by 16 or even 32 to avoid decoding error arising on 
low memory computers. Perseus library 2.x will implement such optimizations 
along with combinatorial puncturing patterns. 

3.2 Perseus Security Parameters 

There is only one parameter which has a direct impact on the Perseus security 
with respect to the encoder reconstruction problem from noisy sequences. This 
parameter ensures that this problem remains hard in practice requiring a huge 
supercomputing power during several days or even weeks for a single encoder. 

This parameter is defined in the Gen Joise_Generator function located in 
the perseus.c file. 



/* Noise probability generation ([0.15, 0.35])*/ 
aNGen->proba = 15 + (int) (20.0 * alea()); 

A noise probability close to 0.15 will in average require a reconstruction time 
in days while a probability close to 0.35 will require weeks or even months of 
computing time. 

The reader must be aware that whenever a noise probability close to 0.50 is 
not possible in the context of Perseus. Such a probability relates to cryptog- 
raphy not to noisy communications. 

3.3 Perseus Noise Generator 

In Perseus library 1.0.0, the random generator is fixed (it will be random from 
versions 2.x). This generator is a biased stream cipher (combining generator [5] 
class). It is initialized by a random 102-bit key which fills up the four linear 
feedback shift registers (LFSR) at time instant t — 0. It is worth noticing 
that the size of the key prevents exhaustive search (to remove the noise by the 
attacker) only. Hence the only possible approach is to reconstruct the encoder 
in the context of a noisy communication. 

The four LFSR polynomials are defined in the perseus.h file as follows: 



/ * Noise generator feedback polynomial 1 */ 

# define POLYl 0x47E07L 

# define MASKl OxTFFFFL 
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# define LRl 19 



/ * Noise generator feedback polynomial 2 */ 

# define POLY2 0xl772AFL 

# define MASK2 0x7FFFFFL 

# define LR2 23 

/* Noise generator feedback polynomial 3 */ 

# define POLY3 0xlC95269L 

# define MASKS OxlFFFFFFFL 

# define LR3 29 

/* Noise generator feedback polynomial 4 */ 

# define POLY4 0x43E98841L 

# define MASK4 0x7FFFFFFFL 

# define LR4 31 



The biased filtering Boolean function whicfi outputs the additive noise to com- 
bine with the encoded sequence is then defined by 



/* Noise probability generation ([0.15, 0.35])*/ 
aNGen->proba = 15 + (int)(20.0 * alea()); 

/* Boolean filtering function generation */ 
w = 0; 

aNGen— >Bf = (unsigned char *)calloc(16, sizeof(unsigned char)); 
for (w = 0; w < 16; w++) 
{ 

val = (int)(99.0 * alea()); 

if(val < aNGen->proba) aNGen->Bf[w] = 1; 

} 



4 Implementation of the PERSEUS Library 

Using and implementing the Perseus library is almost straightforward and 
easy (Figure [2]). In order to illustrate things, a sample test file perseus_test . c 
is provided with the library [TH]. We are going to detail the whole process 
as it is in the library howto file. Let us mention that since the library uses 
dynamic Viterbi decoding (which may be memory consumming depending on 
the instances of Perseus parameters, the decoding may fail if you choose to 
process large amount of data on a computer with limited memory. We strongly 
advise to split data into chunks of less than 2 Kb. The next version of the 
library (from versions 2.x) will consider polynomial time decoding anf therefore 
this limitation will no longer exist. 

Let us suppose that the data to protect are stored into the array data. John 
Doe from USA wants to send them to Jean Martin in France in a secure way. 
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John Doe 



Jean Martin 



Parameter generation 



HTTPS 



Get parameters 



Communication set up 



Encode data 



Decode data 



Processing data 



Figure 2: Implementation structm'e ol the Perseus library 



On John Doe's side, the main steps are (in the foUowing order): 

1. First generating the encoder, the noise generator and the noise generator 
secret key randomly. 

1 /* Generate the PCC encoder */ 

2 Pec = generateCode(); 

3 

4 /* Noise generator secret key generation */ 

5 aKey = (INIT_NOISE_GEN *)calloc(l, 

6 sizeof (INIT_NOISE_GEN) ) ; 
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15 
16 
17 



aKey->INITl 
aKey->INIT2 
aKey->INIT3 
aKey->INIT4 



(unsigned long int)((float) 

(OxFFFFFFFFL)*alea()); 
(unsigned long int)((float) 

(OxFFFFFFFFL)*alea()); 
(unsigned long int) ((float) 

(OxFFFFFFFFL) *alea() ) ; 
(unsigned long int)((float) 

(OxFFFFFFFFL) *alea() ) ; 



/* Noise generator variable allocation */ 

NGen = (NOISE.GEN *)calloc(l, sizeof(NOISE_GEN)); 

/* Noise generator init */ 
if (! Gen_Noise_Generator(NGen, aKey)) 
{ 

perror(" Noise encoder generation on error!"); 

free (NGen); 

exit(O); 

} 



2. Sending the secret elements to Jean Martin through a HTTPS session (or 
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any equivalent secure channel) . This part is not played in the perseuS-test. c 
file (obvious to implement) . The secret elements are the PCC encoder and 
the noise generator secret key. It consists in three structures (defined in 
file perseus.h) 



I * Generic type for Punctured 

Convolutional Code */ 

typedef struct 
{ 

unsigned int mN; 

/* Number of output bits */ 
unsigned int mK; 

/* Number of input bits */ 
unsigned int mM; 

/* Encoder memory size */ 
unsigned long * * mPoly; 

/* Encoder polynomials */ 
unsigned int mMatWidth; 

/* Puncturing matrix width */ 
unsigned char * mMatrix; 

/* Encoder puncturing matrix */ 
unsigned int mMatDepth; 

/* Puncturing matrix weight */ 
} PUNCT_CONC_CODE; 

/ * Generic type for a noise generator * / 
typedef struct 
{ 

unsigned long int Regl; 
/* Linear Feedback Shift Register 1 */ 
unsigned long int Reg2; 
/* Linear Feedback Shift Register 2 */ 
unsigned long int Reg3; 
/* Linear Feedback Shift Register 3 */ 
unsigned long int Reg4; 
/* Linear Feedback Shift Register 4 */ 
unsigned int LI; 
/* Length of LFSR 1 */ 
unsigned int L2; 
/* Length of LFSR 2 */ 
unsigned int L3; 
/* Length of LFSR 3 */ 
unsigned int L4; 
/* Length of LFSR 4 */ 
unsigned char * Bf; 
/* Combining Boolean function */ 
unsigned int proba; 
/* Noise probability */ 
} NOISE.GEN; 
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45 

46 /* Generic type for noise generator 

47 secret key */ 

48 typedef struct 

49 { 

50 unsigned long int INITl; 

51 unsigned long int INIT2; 

52 unsigned long int INIT3; 

53 unsigned long int INIT4; 

54 } INITJ^OISE_GEN; 



3. Encoding the data 



1 encoded_data_size — OL; 

2 if (! pcc_Code(Pcc, data, data_size, &encoded_data, 

3 &encoded_data_size, NGen, aKey)) 
{ 

5 perror(" Encoding error\n"); 

6 exit(O); 
} 

8 

9 printf("Data after encoding = %s\n", encoded_data); 



The PCC encoding includes all basic steps (character to binary encoding, 
the PCC coding itself, data puncturing right after the encoding, the binary 
to hex nibbles encoding, the addition of deterministic noise). The final 
result of the PCC encoding is contained in the array encoded_data. 

4. John Doc sends the encoded data to Jean Martin. 

On Jean Martin's side, the steps are: 

1. Reception of the secret elements through a HTTPS session (PCC encoder 
and the noise generator secret key) from John Doe. The three correspond- 
ing data structures (see above John Doe's step 2) are then initialized. This 
part is not played in the perseus-test.c file (obvious to implement). 

2. Decode data 



1 dataLength = OL; 

2 if (!pcc_decode(Pcc, NGen, aKey, encoded_data, 

3 encoded_data_size, &;dataDecoded, &:dataLength)) 
{ 

5 perror(" Decoding error\n"); 

6 exit(l); 
} 
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The PCC decoding step includes all basic processings (remove the de- 
terministic noise, hex nibble to binary transcoding, data unpuncturing 
and Viterbi decoding). Encoded data are in the array dataCoded while 
Decoded data are contained in the array dataDecoded. 

5 Experimental Results 

We have tested our implementation of the PERSEUS library on a 2 Gb RAM, 
Intel Core2 Duo CPU P8400 (2.26GHz). Data have been processed by chunks 
of 1 or 2 Kb. Of course the performance are depending on the random instances 
of encoders. The main bottleneck remains the dynamic Viterbi decoding which 
takes most of the processing time (more than 70 % of the total time) and of 
the available memory. However average performances are rather good. Let us 
notice that the current release (1.0.0) has not been optimized to preserve the 
code readibility. 

The next version of the Perseus library will consider a polynomial time 
decoding while requiring a negligible amount of memory. 

5.1 Perseus Entropy Profile 

In order to illustrate the fact that PERSEUS-protected data may exhibit an 
entropy profile which is close to that of plain (unprotected) data, we have com- 
puted the average entropy per byte on several files (on different Indo-European 
languages). Table [2] summarizes the results. 



Noise 


Plain data 


PERSEUS-protected data 


Encrypted 


probability 


average entropy 


data 


data 


5 % 


4.21 


4.96 


8.00 


10 % 


4.21 


6.19 


8.00 


15 % 


4.21 


6.46 


8.00 


20 % 


4.21 


7.11 


8.00 


25 % 


4.21 


7.39 


8.00 


30 % 


4.21 


7.45 


8.00 


35 % 


4.21 


7.71 


8.00 



Table 2: Average entropy profile for plain, PERSEUS-protected and AES en- 
crypted data 



These results clearly show that the entropy profile depends on the noise level 
(which is quite obvious). Our tests have also confirmed that the more complex 
the encoder is (in terms of redundancy added) the lower the entropy profile is. 

Let us recall that the convolutional code reconstruction is untractable (in 
reasonable amount of time) as soon as noise probability is higher than a few 
percents (practically > 0.02). So if we want to lower the entropy profile, we can 
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consider noise probability of 5% while preserving the scalable-security provided 
by the Perseus approach. 

5.2 Secure Programming 

Throughout the programming process, the code security was a priority. We 
have paid a maximal attention to this point. Once the Perseus library has 
been achieved, we have performed code auditing with respect to security. 

We have first applied the Flawfinder utility [TD] which tracks unsecure 
programming. It helps preventing buffer overflows, heap overflows... by checking 
the nature and use of common functions. In a second step, we have analyzed how 
efficiently and correctly the Perseus library uses memory. For that purpose, 
the Valgrind utility [TS] has been considered. 

As a result, the C code of the Perseus library complies with the existing 
rules of secure programming and hence does not introduce weakness or flaws 
that could be exploited for attack purposes. 

6 Applications and Implementations 

At the present time, a few implementations and application of the Perseus 
technology are known. We hope that new contributors will volunteer to give 
birth to new ones. 

The DFT Technologic company (http : //www . df t-techno . com) has decided 
to provide the industry support to the Perseus technology and to help and 
promote the research and development effort around it. 

6.1 Firefox Plug-in 

This project is managed by Eddy Deligne [TJ. He has applied the Perseus 
technology to protect HTTP protocol (get and POST methods) while using Fire- 
fox [5]. This solution is materialized in the form of a C-l^-l- Firefox plug-in 
developed under the triple GPL/LGPL/MPL licences and complying with the 
specifications of Mozilla development, thus allowing the code to be merged to 
the Firefox engine code directly. This plug-in is available with the correspond- 
ing server (Linux, Windows) thus providing an all-round solution (client/server 
architecture) . 

At the present time, all Firefox versions 3.x are covered (Windows, Linux, 
Apple). The new Firefox 4.x should be also protected very soon (many structural 
changes have occured with this new version thus requiring significant changes 
in the Perseus plug-in). 
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6.2 Andromeda Library: Protecting the Torrent Protocol 

Fabien Jobin [2] has developped the Andromede Ubrarjj^ which implements 
the bittorrent protocol in its original version (e.g. without any additional third- 
party functionality except one devoted to the extension management). In the 
Andromede library, the bittorrent traffic is protected by the Perseus tech- 
nology 

7 Conclusion and Future Works 

The Perseus technology intends to propose a new trend in information and 
communication security. The concept of scalable security should help to make 
converge the needs for National Security and citizens' natural rights for privacy. 
This technology preserves the ability of state intelligence agencies to have access 
to the PERSEUS-protected data. Indeed the noisy encoding layer can always be 
processed at the price of an offline, time-consuming computing step. Only na- 
tional security agencies and specialized police departments have such a suitable 
computing power. But since it requires a lot of time to break this technology, 
the number of attempts will be limited to process the communication of really 
bad guys only and not those of any ordinary citizen. 

Current research and development activities around Perseus technology 
consider the protection of voice and phone communications as well as file pro- 
tection: 

• development and implementation of VoIP platforms; 

• development of Android modules and apps to provide communication pro- 
tection for various kind of data: voice, sms, mms. . . 

• development of Linux/ Windows application to protect files on hard disk. 

The main difficulty here lies in the Viterbi decoding which is the most time- 
consuming part. However our recent research results to develop a new decoding 
algorithm which has polynomial complexity are more than very promising. This 
is of nature to speed up the decoding step significantly, thus opening a lot of 
opportunities with respect to the Perseus technology. 

Finally, our current work focus on additional plug-ins which enable first to 
lower the entropy profile of PERSEUS-protected data in order to make it far 
closer to plain data and second to make their entropy profile and statistical 
features look like to those of arbitrary data (image files, PDF files...). 
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